Multi-Dimensional Text Classification
نویسندگان
چکیده
This paper proposes a multi-dimensional framework for classifying text documents. In this framework, the concept of multidimensional category model is introduced for representing classes. In contrast with traditional flat and hierarchical category models, the multi-dimensional category model classifies each text document in a collection using multiple predefined sets of categories, where each set corresponds to a dimension. Since a multi-dimensional model can be converted to flat and hierarchical models, three classification strategies are possible, i.e., classifying directly based on the multi-dimensional model and classifying with the equivalent flat or hierarchical models. The efficiency of these three classifications is investigated on two data sets. Using k-NN, naïve Bayes and centroidbased classifiers, the experimental results show that the multi-dimensional-based and hierarchical-based classification performs better than the flat-based classifications.
منابع مشابه
An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملMulti-label ASRS Dataset Classification Using Semi Supervised Subspace Clustering
There has been a lot of research targeting text classification. Many of them focus on a particular characteristic of text data multi-labelity. This arises due to the fact that a document may be associated with multiple classes at the same time. The consequence of such a characteristic is the low performance of traditional binary or multi-class classification techniques on multi-label text data....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002